We will also encounter and make use of many data structures and data types which we have seen throughout our courses on HTS data. You can revisit this material to refresh on HTS data analysis in Bioconductor and R below.
Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
You may navigate to the unarchived VisualisingGenomicsData folder in the Rstudio menu
Session -> Set Working Directory -> Choose Directory
or in the console.
setwd("/PathToMyDownload/VisualizingGenomicsData/viz_course/presentations/Slides")
# e.g. setwd("~/Downloads/VisualizingGenomicsData/viz_course/presentations/Slides") We can create DataTrack objects from GRanges objects and plot these alongside our GenomeAxisTrack object using the plotTracks() functions.
library(rtracklayer)
allChromosomeCoverage <- import.bw("../../Data/activatedReads.bw",
as="GRanges")
accDT <- DataTrack(allChromosomeCoverage,chomosome="chr17")
plotTracks(c(accDT,genomeAxis),
from=47504051,to=47600688,
chromosome="chr17",type="hist") When displaying genomics data it can be important to illustrate the underlying sequence for the genome being viewed.
Gviz uses SequenceTrack objects to handle displaying sequencing information.
First we need to get some sequence information for our genome of interest to display. Here we will use one of the BSgenome packages specific for hg19 - BSgenome.Hsapiens.UCSC.hg19. This contains the full sequence for hg19 as found in UCSC
library(BSgenome.Hsapiens.UCSC.hg19)
BSgenome.Hsapiens.UCSC.hg19[["chr7"]] ## 159138663-letter "DNAString" instance
## seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
We can also specify a DNAstringset object which we have encountered in the Bioconductor courses.
dsSet <- DNAStringSet(Hsapiens[["chr7"]])
names(dsSet) <- "chr7"
sTrack <- SequenceTrack(dsSet)
plotTracks(sTrack,from=134887024,to=134887074,
chromosome = "chr7",cex=2.5) As with IGV, the sequence can be displayed as its complement. This is performed here by setting the complement argument to the plotTracks() function to TRUE/T.
sTrack <- SequenceTrack("../../Data/chr7Short.fa")
plotTracks(sTrack,from=1,to=50,
chromosome = "chr7",complement=T,cex=3) We can also add 5’ to 3’ direction as we have for plotting GenomeAxisTrack objects using the add53 parameter. This allows for a method to illustrate the strand of the sequence being diplayed.
sTrack <- SequenceTrack("../../Data/chr7Short.fa")
plotTracks(sTrack,from=1,to=50,
chromosome = "chr7",complement=F,
add53=T,cex=2.5) Notice the 5’ and 3’ labels have swapped automatically when we have specified the complement sequence.
sTrack <- SequenceTrack("../../Data/chr7Short.fa")
plotTracks(sTrack,from=1,to=50,
chromosome = "chr7",complement=T,
add53=T,cex=2.5) We can control the size of bases with the cex parameter, as with the standard R plotting.
An interesting feature of this is that when plotted bases overlap, Gviz will provide a colour representation of bases instead of the bases’ characters.
plotTracks(sTrack,from=1,to=50,
chromosome = "chr7",cex=2.5) plotTracks(sTrack,from=1,to=50,
chromosome = "chr7",
cex=5) If we want to colour bases by own colour scheme, we can provide a named vector of our desired colours to the fontcolor parameter.
colForLetters <- c(A = "darkgrey", C = "red",
T = "darkgrey",G = "darkgrey")
plotTracks(sTrack,from=1,to=50,
chromosome = "chr7",cex=3,
fontcolor=colForLetters) The AlignmentsTrack object can be plotted in the same manner as tracks using plotTracks() function.
Since the BAM file may contain information from all chromosomes we need to specify a chromsome to plot in the chromosome parameter and here we specify the from and to parameters too.
plotTracks(peakReads,
chromosome="chr5",
from=135312577,
to=135314146) plotTracks(peakReads,
chromosome="chr5",
from=135312577,
to=135314146) By default AlignmentTracks are rendered as both the reads themselves and the calculated coverage/signal depth from these reads.
The type of plot/plots produced can be controlled by the type argument as we have done for DataTrack objects.
The valid types of plots for AlignmentsTrack objects are “pileup”, “coverage” and “sashimi” (We’ve come across sashimi plots before).
The type “pileup” displays just the reads.
plotTracks(peakReads,
chromosome="chr5",
from=135312577,
to=135314146,
type="pileup") The type “coverage” displays just the coverage (depth of signal over genomic positions) calculated from the genomic alignments.
plotTracks(peakReads,
chromosome="chr5",
from=135312577,
to=135314146,
type="coverage") As we have seen the default display is a combination of “pileup” and “coverage”.
We can provide multiple type arguments to the plotTracks() function as a vector of valid types. The order in vector here does not affect the display order in panels.
plotTracks(peakReads,
chromosome="chr5",
from=135312577,
to=135314146,
type=c("pileup","coverage")) To recapitulate this plot, we have retrieved the subsection of BodyMap data as BAM files.
First we must create two AlignmentsTrack objects, one for each tissue’s BAM file of aligned reads.
In this case since we are working with paired-end reads we must specify this by setting the isPaired parameter to TRUE
heartReads <- AlignmentsTrack("../../Data/heart.bodyMap.bam",
isPaired = TRUE)
liverReads <- AlignmentsTrack("../../Data/liver.bodyMap.bam",
isPaired = TRUE)
liverReads ## ReferenceAlignmentsTrack 'AlignmentsTrack'
## | genome: NA
## | active chromosome: chrNA
## | referenced file: ../../Data/liver.bodyMap.bam
## | mapping: id=id, cigar=cigar, mapq=mapq, flag=flag, isize=isize, groupid=groupid, status=status, md=md, seq=seq
To reproduce a plot similar to that in IGV we can simply include the “sashimi” type in the type parameter vector, here alongside “coverage”
plotTracks(c(heartReads,liverReads),
chromosome="chr12",
from=98986825,
to=98997877,
type=c("coverage","sashimi")) The AlignmentTrack object allows for specific parameters controlling how reads are displayed to be passed to the plotTracks() function.
A few useful parameters are col.gaps and col.mates or lty.gap and lty.mates which will allow us to better distinguish between gapped alignments (split reads) and gaps between read pairs respectively.
plotTracks(c(liverReads),
chromosome="chr12",
from=98986825,to=98997877,
col.gap="Red",col.mate="Blue") Similarly using lty.gap and lty.mate parameters.
plotTracks(c(liverReads),
chromosome="chr12",
from=98986825,to=98997877,
lty.gap=2,lty.mate=1) A common purpose in visualising alignment data in broswers is review information relating to mismatches to the genome which may be related to SNPs.
In order to highlight mismatches to the genome reference sequence we must first provide some information on the reference sequence.
One method for this is to attach sequence information to the AlignmentsTrack itself by providing a SequenceTrack object to referenceSequence parameter in the AlignmentsTrack() constructor. Here we can create the SequenceTrack object ffor mouse.
library(BSgenome.Mmusculus.UCSC.mm10)
sTrack <- SequenceTrack(BSgenome.Mmusculus.UCSC.mm10)
activatedReads <- AlignmentsTrack("../../Data/activatedSNPread.bam",
isPaired = TRUE,
referenceSequence=sTrack) We could also specify the SequenceTrack in the plotTracks() function as shown for the liver reads example here. Here we simply include the relevant SequenceTrack object as a track to be plotted alongside the BAM.
plotTracks(c(activatedReads,sTrack),
chromosome="chr6",
from=124815373,to=124815412,cex=2) The biomaRt Bioconductor package to programatically query various Biomarts.
Here we construct a simple BiomartGeneRegionTrack object using the parameters to define locations of interest - “chromsome”, “start”,“end”,“genome” as well as the Biomart to use, in this case Ensembl by setting the name parameter.
bgrTrack <- BiomartGeneRegionTrack(genome="hg19",
start=26591341,
end=27034958,
chromosome = "chr7",
name="ENSEMBL") We can also specify filters in the BiomartGeneRegionTrack() constructor using the filter parameter.
Gviz uses the BiomaRt Bioconductor package to query the Biomarts so we can review availble options using BiomaRt’s useMart() function to list available data collections and listDatasets() function to review all available datasets within the ENSEMBL_MART_ENSEMBL mart.
library(biomaRt)
martList <- listMarts()
mart = useMart("ENSEMBL_MART_ENSEMBL")
dataList <- listDatasets(mart)
mart = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")
filterList <- listFilters(mart)Once we have retrieved our filtered gene models we can plot them as before.
plotTracks(bgrTrack) To understand which tables are available we can query the rtracklayer package to identify track and table names.
We can use the broswerSession() function to connect to UCSC and the genome() function to establish the genome of interest.
We can then review available tracks using the trackNames() function.
library(rtracklayer)
session <- browserSession()
genome(session) <- "hg19"
trackNames(session) ## Base Position Alt Haplotypes
## "ruler" "altLocations"
## Assembly BAC End Pairs
## "gold" "bacEndPairs"
## BU ORChID Chromosome Band
## "wgEncodeBuOrchid" "cytoBand"
## deCODE Recomb ENCODE Pilot
## "decodeRmap" "encodeRegions"
## FISH Clones Fosmid End Pairs
## "fishClones" "fosEndPairs"
## Gap GC Percent
## "gap" "gc5Base"
## GRC Incident GRC Map Contigs
## "grcIncidentDb" "ctgPos2"
## GRC Patch Release Hg18 Diff
## "altSeqComposite10" "hg19ContigDiff"
## Hg38 Diff Hi Seq Depth
## "hg38ContigDiff" "hiSeqDepth"
## INSDC LRG Regions
## "ucscToINSDC" "lrg"
## Map Contigs Mappability
## "ctgPos" "wgEncodeMapability"
## Recomb Rate RefSeq Acc
## "recombRate" "ucscToRefSeq"
## Restr Enzymes Short Match
## "cutters" "oligoMatch"
## STS Markers Wiki Track
## "stsMap" "wikiTrack"
## UCSC Genes NCBI RefSeq
## "knownGene" "refSeqComposite"
## AceView Genes AUGUSTUS
## "acembly" "augustusGene"
## CCDS CRISPR...
## "ccdsGene" "crispr"
## Ensembl Genes EvoFold
## "ensGene" "evofold"
## Exoniphy GENCODE...
## "exoniphy" "wgEncodeGencodeSuper"
## Geneid Genes Genscan Genes
## "geneid" "genscan"
## H-Inv 7.0 IKMC Genes Mapped
## "hinv70Composite" "hgIkmc"
## lincRNAs... LRG Transcripts
## "lincRNAs" "lrgTranscriptAli"
## MGC Genes N-SCAN
## "mgcFullMrna" "nscanGene"
## Old UCSC Genes ORFeome Clones
## "knownGeneOld6" "orfeomeMrna"
## Other RefSeq Pfam in UCSC Gene
## "xenoRefGene" "ucscGenePfam"
## Retroposed Genes SGP Genes
## "ucscRetroAli5" "sgpGene"
## SIB Genes sno/miRNA
## "sibGene" "wgRna"
## TransMap... tRNA Genes
## "transMapV4" "tRNAs"
## UCSC Alt Events UniProt
## "knownAlt" "uniprot"
## Vega Genes Yale Pseudo60
## "vegaGeneComposite" "pseudoYale60"
## Publications ClinGen CNVs
## "pubs" "iscaComposite"
## ClinVar Variants Coriell CNVs
## "clinvar" "coriellDelDup"
## COSMIC Regions DECIPHER CNVs
## "cosmicRegions" "decipher"
## Development Delay GAD View
## "cnvDevDelay" "gad"
## Gene Interactions GeneReviews
## "interactions" "geneReviews"
## GWAS Catalog HGMD Variants
## "gwasCatalog" "hgmd"
## Lens Patents LOVD Variants
## "patSeq" "lovd"
## MGI Mouse QTL OMIM Alleles
## "jaxQtlMapped" "omimAvSnp"
## OMIM Genes OMIM Pheno Loci
## "omimGene2" "omimLocation"
## RGD Human QTL RGD Rat QTL
## "rgdQtl" "rgdRatQtl"
## SNPedia UniProt Variants
## "snpedia" "spMut"
## Web Sequences CGAP SAGE
## "pubsBingBlat" "cgapSage"
## Gene Bounds H-Inv
## "rnaCluster" "HInvGeneMrna"
## Human ESTs Human mRNAs
## "est" "mrna"
## Human RNA Editing Other ESTs
## "darned" "xenoEst"
## Other mRNAs Poly(A)
## "xenoMrna" "polyA"
## PolyA-Seq SIB Alt-Splicing
## "polyASeqSites" "sibTxGraph"
## Spliced ESTs UniGene
## "intronEst" "uniGene_3"
## GTEx Gene GTEx Transcript
## "gtexGene" "gtexTranscExpr"
## Affy Exon Array Affy GNF1H
## "affyExonArray" "affyGnf1h"
## Affy RNA Loc Affy U95
## "wgEncodeAffyRnaChip" "affyU95"
## Affy U133 Affy U133Plus2
## "affyU133" "affyU133Plus2"
## Allen Brain Burge RNA-seq
## "allenBrainAli" "burgeRnaSeqGemMapperAlign"
## CSHL Small RNA-seq ENC Exon Array...
## "wgEncodeCshlShortRnaSeq" "wgEncodeExonArraySuper"
## ENC ProtGeno... ENC RNA-seq...
## "wgEncodeProtGenoSuper" "wgEncodeRnaSeqSuper"
## GIS RNA PET GNF Atlas 2
## "wgEncodeGisRnaPet" "gnfAtlas2"
## GWIPS-viz Riboseq Illumina WG-6
## "gwipsvizRiboseq" "illuminaProbes"
## PeptideAtlas qPCR Primers
## "peptideAtlas2014" "qPcrPrimers"
## RIKEN CAGE Loc Sestan Brain
## "wgEncodeRikenCage" "sestanBrainAtlas"
## ENCODE Regulation... GTEx Combined eQTL
## "wgEncodeReg" "gtexEqtlCluster"
## GTEx Tissue eQTL CD34 DnaseI
## "gtexEqtlTissue" "eioJcviNAS"
## CpG Islands... ENC Chromatin...
## "cpgIslandSuper" "wgEncodeChromSuper"
## ENC DNA Methyl... ENC DNase/FAIRE...
## "wgEncodeDnaMethylSuper" "wgEncodeDNAseSuper"
## ENC Histone... ENC RNA Binding...
## "wgEncodeHistoneSuper" "wgEncodeRbpSuper"
## ENC TF Binding... FSU Repli-chip
## "wgEncodeTfBindingSuper" "wgEncodeFsuRepliChip"
## Genome Segments NKI Nuc Lamina...
## "wgEncodeAwgSegmentation" "laminB1Super"
## ORegAnno Stanf Nucleosome
## "oreganno" "wgEncodeSydhNsome"
## SUNY SwitchGear SwitchGear TSS
## "wgEncodeSunySwitchgear" "switchDbTss"
## TFBS Conserved TS miRNA sites
## "tfbsConsSites" "targetScanS"
## UCSF Brain Methyl UMMS Brain Hist
## "ucsfBrainMethyl" "uMassBrainHistone"
## UW Repli-seq Vista Enhancers
## "wgEncodeUwRepliSeq" "vistaEnhancers"
## Conservation Cons 46-Way
## "cons100way" "cons46way"
## Cons Indels MmCf Evo Cpg
## "consIndelsHgMmCanFam" "evoCpg"
## GERP phastBias gBGC
## "allHg19RS_BW" "phastBias"
## Primate Chain/Net Placental Chain/Net
## "primateChainNet" "placentalChainNet"
## Vertebrate Chain/Net 5% Lowest S
## "vertebrateChainNet" "ntSssTop5p"
## H-C Coding Diffs Neandertal Methyl
## "ntHumChimpCodingDiff" "neandertalMethylation"
## Neandertal Seq S SNPs
## "ntSeqReads" "ntSssSnps"
## Sel Swp Scan (S) Denisova Methyl
## "ntSssZScorePMVar" "denisovaMethylation"
## Denisova Seq Denisova Variants
## "dhcBamDenisova" "dhcVcfDenisovaPinky"
## Mod Hum Variants Modern Derived
## "dhcVcfModern" "dhcHumDerDenAnc"
## Common SNPs(150) Common SNPs(147)
## "snp150Common" "snp147Common"
## Common SNPs(146) Common SNPs(144)
## "snp146Common" "snp144Common"
## Common SNPs(142) Common SNPs(141)
## "snp142Common" "snp141Common"
## All SNPs(150) All SNPs(147)
## "snp150" "snp147"
## All SNPs(146) All SNPs(144)
## "snp146" "snp144"
## All SNPs(142) All SNPs(141)
## "snp142" "snp141"
## Flagged SNPs(150) Flagged SNPs(147)
## "snp150Flagged" "snp147Flagged"
## Flagged SNPs(146) Flagged SNPs(144)
## "snp146Flagged" "snp144Flagged"
## Flagged SNPs(142) Flagged SNPs(141)
## "snp142Flagged" "snp141Flagged"
## Mult. SNPs(150) Mult. SNPs(147)
## "snp150Mult" "snp147Mult"
## Mult. SNPs(146) Mult. SNPs(144)
## "snp146Mult" "snp144Mult"
## Mult. SNPs(142) Mult. SNPs(138)
## "snp142Mult" "snp138Mult"
## All SNPs(138) Common SNPs(138)
## "snp138" "snp138Common"
## Flagged SNPs(138) 1000G Ph1 Accsbl
## "snp138Flagged" "tgpPhase1Accessibility"
## 1000G Ph1 Vars 1000G Ph3 Accsbl
## "tgpPhase1" "tgpPhase3Accessibility"
## 1000G Ph3 Vars DGV Struct Var
## "tgpPhase3" "dgvPlus"
## EVS Variants ExAC
## "evsEsp6500" "exac"
## Genome Variants GIS DNA PET
## "pgSnp" "wgEncodeGisDnaPet"
## HAIB Genotype HapMap SNPs
## "wgEncodeHaibGenotype" "hapmapSnps"
## HGDP Allele Freq SNP/CNV Arrays
## "hgdpGeo" "genotypeArrays"
## RepeatMasker Interrupted Rpts
## "rmsk" "nestedRepeats"
## Microsatellite NumtS Sequence
## "microsat" "numtSeq"
## Segmental Dups Self Chain
## "genomicSuperDups" "chainSelf"
## Simple Repeats
## "simpleRepeat"
ucscTrack <- UcscTrack(genome = "hg19",
chromosome = "chr7",
track = "ensGene",
from = 26591341,
to = 27034958,
trackType = "GeneRegionTrack",
rstarts = "exonStarts",
rends = "exonEnds",
gene ="name",
symbol = "name2",
transcript = "name",
strand = "strand"
) Now we can compare the Ensembl gene builds from the two different sources.
Notable differences in the annotation include the absense of some transcipts due to the additional filter applied in our BiomartGeneRegionTrack object creation.
plotTracks(c(bgrTrack,ucscTrack),
from = 26591341,to = 27034958) conservationTrack <- UcscTrack(genome = "hg19",
chromosome = "chr5",
track = "Conservation",
table = "phyloP100wayAll",
from = 135313003,to = 135313570,
trackType = "DataTrack",
start = "start", end = "end",
data = "score",type = "hist", window = "auto",
col.histogram = "darkblue",fill.histogram = "darkblue",
ylim = c(-3.7, 4),name = "Conservation") Time for solutions! Link here